Reliable Single Chip Genotyping with Semi-Parametric Log-Concave Mixtures
نویسندگان
چکیده
The common approach to SNP genotyping is to use (model-based) clustering per individual SNP, on a set of arrays. Genotyping all SNPs on a single array is much more attractive, in terms of flexibility, stability and applicability, when developing new chips. A new semi-parametric method, named SCALA, is proposed. It is based on a mixture model using semi-parametric log-concave densities. Instead of using the raw data, the mixture is fitted on a two-dimensional histogram, thereby making computation time almost independent of the number of SNPs. Furthermore, the algorithm is effective in low-MAF situations.Comparisons between SCALA and CRLMM on HapMap genotypes show very reliable calling of single arrays. Some heterozygous genotypes from HapMap are called homozygous by SCALA and to lesser extent by CRLMM too. Furthermore, HapMap's NoCalls (NN) could be genotyped by SCALA, mostly with high probability. The software is available as R scripts from the website www.math.leidenuniv.nl/~rrippe.
منابع مشابه
Non-parametric log-concave mixtures
Finite mixtures of parametric distributions are often used to model data of which it is known or suspected that there are subpopulations. Instead of a parametric model, a penalized likelihood smoothing algorithm is developed. The penalty is chosen to favor a log-concave result. The standard EM algorithm (“split and fit”) can be used. Theoretical results and applications are presented. © 2006 El...
متن کاملClustering with mixtures of log-concave distributions
The EM algorithm is a popular tool for clustering observations via a parametric mixture model. Two disadvantages of this approach are that its success depends on the appropriateness of the assumed parametric model, and that each model requires a different implementation of the EM algorithm based on model-specific theoretical derivations. We show how this algorithm can be extended to work with t...
متن کاملInfluence of waste tire chips on steady state behavior of sand
Materials such as waste tire chips were widely used to improve the strength of soil. The objective of this study is to discuss the residual strength or steady-state behavior of sand-waste tire chip mixtures. A series of undrained monotonic triaxial compression tests were conducted on reconstituted saturated specimens of sand and sand-tire chip mixtures with variation in the tire-chip contents f...
متن کاملInference and Modeling with Log-concave Distributions
Log-concave distributions are an attractive choice for modeling and inference, for several reasons: The class of log-concave distributions contains most of the commonly used parametric distributions and thus is a rich and flexible nonparametric class of distributions. Further, the MLE exists and can be computed with readily available algorithms. Thus, no tuning parameter, such as a bandwidth, i...
متن کاملDistinguishing Log-Concavity from Heavy Tails
Well-behaved densities are typically log-convex with heavy tails and log-concave with light ones. We discuss a benchmark for distinguishing between the two cases, based on the observation that large values of a sum X1 + X2 occur as result of a single big jump with heavy tails whereas X1, X2 are of equal order of magnitude in the light-tailed case. The method is based on the ratio |X1 − X2|/(X1 ...
متن کامل